Neural Computation in Medicine: Perspectives and Prospects∗

نویسنده

  • Richard Dybowski
چکیده

In 1998, over 400 papers on artificial neural networks (ANNs) were published in the context of medicine, but why is there this interest in ANNs? And how do ANNs compare with traditional statistical methods? We propose some answers to these questions, and go on to consider the ‘black box’ issue. Finally, we briefly look at two directions in which ANNs are likely to develop, namely the use of Bayesian statistics and knowledgedata fusion. 1 Cognitive and data modelling In the wake of publication of the back-propagation algorithm in 1986, the number of ANN-oriented articles featured in the Medline database has grown from 2 in 1990 to 473 in 1998. But why is there this interest in ANNs within medicine? The design of ANNs was originally motivated by the phenomena of learning and recognition, and the desire to model these cognitive processes. The cognitive-modelling branch of ANN research is still active, and it is relevant to medicine in providing models of psychological and cerebral dysfunction [1, 2, 3]. However, in the late 1980s, a more pragmatic stance emerged, and ANNs became to be seen also as tools for data modelling, primarily for classification. Medicine involves decision making, and classification is an integral part of that process, but medical classification tasks, such as diagnosis, can be far from straightforward. At least two sources of difficulty can be identified. • Clinical laboratories are being subjected to an ever-increasing workload. Much of the data received by these laboratories consists of complex figures, such as cytological specimens – objects traditionally interpreted by experts – but experts are a limited resource. • The complexity of patient-related data can be such that even an expert can overlook important details. With the production of high volumes of possibly high-dimensional data from instruments such as flow-cytographs, and the integration of disparate data sources through data fusion, this is becoming an increasing problem. ∗In Malmgren H, Borga M, Niklasson L. (eds.) Proceedings of the ANNIMAB-1 Conference (Artificial Neural Networks in Medicine and Biology), Göteborg, 13-16 May 2000 (Springer, 2000) pp. 26–36. Presented with such problems, it is natural that the medical fraternity should turn to ANNs in the hope that these adaptable models could alleviate at least some of the problems; however, the use of ANNs in medicine has raised a number of issues. One of these is the relationship between ANNs and statistics; another is the use of ‘black box’ systems in medicine. We look at both these issues and then consider two directions in which ANNs are likely to develop. 2 A statistical perspective 2.1 Multilayer perceptrons Multilayer perceptrons (MLPs) with sigmoidal hidden-node functions are the ANNs most commonly used in medicine. Medical data are diverse and, at times, highly complex [4], but MLPs are universal approximators. The flexibility of MLPs has enabled them to be applied to a wide variety of medical fields [5, 6, 7, 8, 9, 10, 11], including oncology [12], dentistry [13], bacteriology [14], and sleep research [15]. For example, Tarrasenko [16] describes how an MLP was used to associate an electroencaphalograph (EEG) with deep sleep, REM sleep or wakefulness. EEGs from nine patients were digitized and, in order to reduce the dimensionality of feature space, the EEGs were characterized by the ten coefficients of an autoregressive time-series model. These coefficients were used as the input vectors for an MLP with three outputs. The number of hidden units was varied from 4 to 20, and five initially random weight vectors were used. The best-performing MLP had a misclassification rate of 10.6%. In comparison, nearest-neighbour classification gave a misclassification rate of 18.3%. The genesis and renaissance of ANNs took place within various communities, and papers published during these periods reflected the disciplines involved: biology and cognition; statistical physics; and computer science. But it was not until the early 1990s that a probability-theoretic perspective emerged, which regarded ANNs as being within the framework of statistics [17, 18, 19, 20]. A recurring theme of this literature is that many ANNs are analogous, or identical to, existing statistical techniques. For example, a popular statistical method for modelling the relationship between a binary response variable and a vector of covariates is logistic regression, but a single-layer perceptron with a logistic output function and trained by a cross-entropy error function will be functionally identical to a main-effects logistic regression model. Furthermore, an MLP can be regarded as a both a non-linear extension of logistic regression and as a particular type of projection pursuit regression model. However, Ripley & Ripley [21] point out that the statistical algorithms for fitting projection pursuit regression are not as effective as those for fitting MLPs. One may ask whether the apparent similarity between ANNs and existing statistical methods means that ANNs are redundant. One answer to this is given by Ripley [22]: The traditional methods of statistics and pattern recognition are either parametric based on a family of models with a small number of parameters, or non-parametric in which the models used are totally flexible. One of the impacts of neural network methods on pattern recognition has been to emphasize the need in large-scale practical problems for something in between, families of models with large but not unlimited flexibility given by a large number of parameters. Another response is to point out that the widespread fascination for ANNs has attracted many talented researchers and potential users into the realm of data modelling. It is true that the neural-computing community re-discovered some statistical concepts already in existence, but this influx of participants has created new ideas and refined existing ones. These benefits include the learning of sequences by time delay and partial recurrence [23], and the creation of powerful visualization techniques, such as generative topographic mapping [24]. Thus the ANN movement has resulted in statisticians having available to them a collection of techniques to add to their repertoire. Furthermore, the placement of ANNs within a statistical framework has provided a firmer theoretical foundation for neural computation. This has led to new developments such as the Bayesian approach to ANNs [25], and to improvements to existing neural methods [26]. ANNs can be used jointly with conventional statistical methods. In the scheme suggested by Goodman & Harrell [27], the performance metrics of a generalized linear model are compared with those from an MLP. If the two approaches are not found to differ with statistical significance, the linear model is chosen since it simpler with respect to computation and interpretation. The authors demonstrate this approach in the context of a coronary artery bypass dataset. MLPs are trained using exemplars from each class of interest, but many pathologies are uncommon. In such situations a balanced training set should be used for the under-represented classes [28]. However, this may not be feasible when abnormalities are very rare, in which case novelty detection should be considered [29]. It should be emphasized that, even with correct training, an ANN will not necessarily be the best choice for a classification task in terms of accuracy. This has been highlighted by Wyatt [30], who wrote: Neural net advocates claim accuracy as the major advantage. However, when a large European research project, StatLog, examined the accuracy of five ANN and 19 traditional statistical or decisiontree methods for classifying 22 sets of data, including three medical datasets [31], a neural technique was the most accurate in only one dataset, on DNA sequences. For 15 (68%) of the 22 sets, traditional statistical methods were the most accurate, and those 15 included all three medical datasets. But one should add the comment made by Michie et al [31] on the results of the StatLog project: With care, neural networks perform very well as measured by error rate. They seem to provide either the best or near best predictive performance in nearly all cases ... Nevertheless, in order to justify the implementation of an ANN, its ease of implementation and performance should be compared with that obtained from one or more appropriate standard statistical techniques. Furthermore, one should also be aware of emerging non-neural alternatives, such as support vector machines for classification and independent component analysis for visualization, and also make comparisons with these, if appropriate. 2.2 Self-organizing feature maps The most common neural system for unsupervised training is Kohonen’s selforganizing feature maps (SOFMs) [32]. The aim of SOFMs is to map an input vector to one of a set of neurons arranged in a lattice, and to do so in such a way that positions in input space are topologically ordered with locations on the lattice. Hertz et al [33] liken this process to an elastic net, existing in input space, which wants to come as close as possible to the input vectors of a training set. Amongst the interesting medical applications of SOFMs are their use for classification of craniofacial growth patterns [34], the extraction of information from electromyographic signals with regard to motor unit action potentials [35], and magnetic-resonance image segmentation [36]. With regard to the first example, an SOFM was used to extract the most relevant information from mandibular growth data. The position of a patient on the resulting map could be used to aid orthodontic diagnosis and treatment. The example by Glass & Reddick [36] demonstrates how an SOFM can be used for image segmentation. An SOFM was trained with pixels from viable tumors and necrotic tissue, as visualized by magnetic resonance images. An MLP was then trained to distinguish between these two types of tissue on the basis of the SOFMs final input-node weights. Consequently, the SOFM-MLP combination characterized the type of tissue represented by a new pixel. This provided a non-invasive technique to assess an osteosarcoma patient’s response to chemotherapy. The so-called Growing Cell Structure [37] is related to SOFMs, and Walker et al [38] used this technique for the cytodiagnosis of breast carcinoma. In addition to performing classification, the method also enabled them to visualize how the input variables were involved in the classification. Although the SOFM algorithm provides a means of visualizing the distribution of data points in input space, Bishop [39] points out that this can be weak if the data do not lie within a two-dimensional subspace of the higherdimensional space containing the data. Another observation, made by Balakrishnan et al [40] is that Kohonen feature maps are similar to the statistical technique of k-means clustering, yet it has been our observation that many papers describing an SOFM do not compare the efficacy of it with another visualization technique applied to the same dataset. 2.3 Radial-basis function networks The second most commonly used ANNs in medicine are the radial-basis function networks (RBFNs). An RBFN can be regarded as a type of generalized linear discriminant function, a linear function of functions that permits the construction of non-linear decision surfaces. The basis functions of an RBFN define local responses (receptive fields). Typically, only some of the hidden units (basis functions) produce significant values for the final layers. This is why RBNFs are sometimes referred to as localized receptive field networks. In contrast, all the hidden units of an MLP are involved in determining the output from the network (they are said to form a distributed representation). The receptive-field approach can be advantageous when the distribution of the data in the space of input values is multimodal. Furthermore, RBFNs can be trained more quickly than MLPs [41], but the number of basis functions required grows exponentially with the number of input nodes, and an increase in the number of basis functions increases the time taken, and amount of data required, to train an RBFN adequately. Construction of an RBFN is a two-step process. The first step is typically performed using an unsupervised method, such as k-means clustering or an SOFM, to define the basis functions. This exploits the distribution of the classes in feature space. The second step uses supervised learning for the outputlayer weights via linear optimization, which is faster than the typical training algorithms used for MLPs. The potential advantages of these two steps have been investigated for a number of medical applications. For example, an RBFN approximated well the nonlinearity of heart dynamics [42]. Each basis function provided a local reconstruction of the dynamics in the space spanned by the basis functions. Other examples include the classification of cervical-tissue fluorescent spectra [43], the identification of bacteria (clustering with respect to mass spectra) [44] and spinal disorders (clustering with respect to motion data) [45]. However, even when an SOFM suggests good discrimination between the classes of interest, an RBFN may not perform as well as an MLP. For example, when the EEG classification mentioned in Section 2.1 was repeated using an RBFN in place of the MLP, the misclassification rate increased slightly to 11.6% [16]. When conditions are such that an RBFN can act as a classifier [46, 47], an advantage of the local nature of RBFNs compared with MLP classifiers is that a new set of input values that falls outside all the localized receptor fields could be flagged as not belonging to any of the classes represented. In other words, the set of input values is novel. This is a more cautious approach than the resolute classification that can occur with MLPs, in which a set of input values is always assigned to a class, irrespective of the values. 2.4 Adaptive Resonance Theory Networks Although not a common ANN, an adaptive resonance theory (ART) network is noteworthy. The ART process [48] can be regarded as a type of hypothesis test [49]. A pattern presented at an input layer is passed to a second layer, which is interconnected to the first. The second layer makes a guess about the category to which the original pattern belongs, and this hypothetical identity is passed back to the first layer. The hypothesis is compared with the original pattern and, if found to be a close match, the hypothesis and original pattern reinforce each other (resonance is said to take place). But if the hypothesis is incorrect, the second layer produces another guess. If the second layer cannot eventually provide a good match with the pattern, the original pattern is learned as the first example of a new category. Spencer et al [50] incorporated ART into a system capable of discovering temporal patterns in ICU data and predicting the onset of haemodynamic disorders. Although ART provides unsupervised learning, an extension called ARTMAP [51] combines two ART modules to enable supervised learning to take place. Harrison et al [52] describe how ARTMAP can be used to update a knowledge base. They do so in the context of ECG diagnosis of myocardial infarction and the cytopathological diagnosis of breast lesions. In spite of resolving the stability/plasticity dilemma [53], the ART algorithms can be sensitive to noise [54]. Furthermore, Ripley [22] questions the virtue of the ART algorithms over adaptive k-means clustering [55]. 3 The “black-box” issue A criticism leveled against neural networks is that they are ‘black-box’ systems [30, 56] (although this can be less of a problem for the mathematically minded [57]). By this it is meant that the manner in which a neural network derives an output value from a given feature vector is not comprehensible to the non-specialist, and that this lack of comprehension makes the output from neural networks unacceptable. There are a number of properties that we desire in a model, two of which are accuracy (the ‘closeness’ of a model’s estimated value to the true value) and interpretability. By interpretability, we mean the type of input-output relationships that can be extracted from a model and which are comprehensible to the intended users of the model. An interpretable model is advantageous for several reasons. Firstly, It could be educational by supplying a previously unknown but useful input-output summary. This, in turn, can lead to new areas of research. Secondly, it could disclose an error in the model when an input-output summary or explanation contradicts known facts. Does the lack of interpretability, as defined above, make a model unacceptable? That depends on the purpose of the model. Suppose that the choice of a statistical model for a given problem is reasonable (on theoretical or heuristical grounds), and an extensive empirical assessment of the model (for example, by cross-validation and prospective evaluation) shows that its parameters provide an acceptable degree of accuracy over a wide range of input vectors. The use of such a model for prediction would generally be approved, subject to a performance-monitoring policy. Why not apply the same reasoning to neural networks, which are, after all, non-standard statistical models? But suppose that we are interested in knowledge discovery; by this we mean the extraction of previously unknown but useful information from data. With a trained MLP, it is difficult to interpret the mass of weights and connections within the network, and the interactions implied by these. However, the goal of rule extraction [58] is to map the (possibly complex) associations encoded by the functions and parameters of a trained ANN to a set of comprehensible if-then rules. If successful, such a mapping would lead to an interpretable collection of statements describing the associations discovered by the ANN, which, in turn, may lead to clinical insight. Another approach to providing interpretability is to use a hybrid neurofuzzy system [59]. These are feed-forward networks built from if-then rules containing linguistic terms based on domain knowledge, and the membership functions associated with the fuzzy rules can be tuned to data by means of a training algorithm. Both rule extraction and hybrid neuro-fuzzy systems are responses to those clinicians unwilling to use a predictive model lacking interpretability, even when the model is highly accurate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward More Compassionate Healthcare Systems; Comment on “Enabling Compassionate Healthcare: Perils, Prospects and Perspectives”

Compassion is central to the purpose of medicine and the care of patients and their families. Compassionate healthcare begins with compassionate people, but cannot be consistently provided without systemic changes that enable clinicians and staff to collaborate and to care. We propose seven essential commitments to foster more compassionate healthcare organizations and systems: a commitment to ...

متن کامل

Enabling Compassionate Health Care: Perils, Prospects and Perspectives

There is an emerging consensus that caring and compassion are under threat in the frenetic environment of modern healthcare. Enabling and sustaining compassionate care requires not only a focus on the needs of the patient, but also on those of the care giver. As such, threats and exhortations to health professionals are likely to have limited and perverse effects and it is to the organisational...

متن کامل

Ginseng Use in Medicine: Perspectives on CNS Disorders

Ginseng, the root of Panax species, is a well-known folk medicine. It has been used as traditional herbal medicine in China, Korea and Japan for thousands of years and today is a popular and worldwide used natural medicine. The active ingredients of ginseng are ginsenosides which are also called ginseng saponins. Recently, there is increasing evidence in the literature on the pharmacological an...

متن کامل

Reverse Electrodialysis for Salinity Gradient Power Generation: Challenges and Future Perspectives

Salinity gradient energy, which is also known as Blue energy, is a renewable energy form that can be extracted from the mixing of two solutions with different salinities. About 80% of the current global electricity demand could potentially be covered by this energy source. Among several energy extraction technologie...

متن کامل

Learning Curve Consideration in Makespan Computation Using Artificial Neural Network Approach

This paper presents an alternative method using artificial neural network (ANN) to develop a scheduling scheme which is used to determine the makespan or cycle time of a group of jobs going through a series of stages or workstations. The common conventional method uses mathematical programming techniques and presented in Gantt charts forms. The contribution of this paper is in three fold. First...

متن کامل

Neonatal Medicine in Iran: Current Challenges and Prospects

Background: Pediatric medicine in Iran has been known as a specialty for more than 50 years; nevertheless, the neonatal-perinatal medicine is still in the first decades of its life. Regarding this, the present study was conducted toprovide an overall view of the current situation of the neonatal medicine, its challenges, and prospects in Iran.Methods: For the purpose of the study, a questionnai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000